Goto

Collaborating Authors

 recognize speech


Speech Recognition With LLMs Adapted to Disordered Speech Using Reinforcement Learning

Nagpal, Chirag, Venugopalan, Subhashini, Tobin, Jimmy, Ladewig, Marilyn, Heller, Katherine, Tomanek, Katrin

arXiv.org Artificial Intelligence

We introduce a large language model (LLM) capable of processing speech inputs and show that tuning it further with reinforcement learning on human preference (RLHF) enables it to adapt better to disordered speech than traditional fine-tuning. Our method replaces low-frequency text tokens in an LLM's vocabulary with audio tokens and enables the model to recognize speech by fine-tuning it on speech with transcripts. We then use RL with rewards based on syntactic and semantic accuracy measures generalizing the LLM further to recognize disordered speech. While the resulting LLM does not outperform existing systems for speech recognition, we find that tuning with reinforcement learning using custom rewards leads to substantially better performance than supervised fine-tuning of the language model, specifically when adapting to speech in a different setting. This presents a compelling alternative tuning strategy for speech recognition using large language models.


Are Alexa and Siri AI?

FOX News

Angie Wisdom and Dr. Chirag Shah discuss how artificial intelligence could play a role in online and professional relationships. It might be some time before we see the futuristic concept of artificial intelligence that is depicted in science fiction novels and films come about in real life, but AI is still all around us. Most homes have some form of voice assistant gadget, such as an Alexa smart home device or Siri assistant on an iPhone. These machines have developed the ability to learn and respond in a way similar to humans' cognitive abilities, all thanks to artificial intelligence algorithms. Alexa and Siri are applications powered by artificial intelligence.


A model that can recognize speech in different languages from a speaker's lip movements

#artificialintelligence

In recent years, deep learning techniques have achieved remarkable results in numerous language and image-processing tasks. This includes visual speech recognition (VSR), which entails identifying the content of speech solely by analyzing a speaker's lip movements. While some deep learning algorithms have achieved highly promising results on VSR tasks, they were primarily trained to detect speech in English, as most existing training datasets only include English speech. This limits their potential user base to people who live or work in English-speaking contexts. Researchers at Imperial College London have recently developed a new model that can tackle VSR tasks in multiple languages.


A Cartoon Guide to Language Models in NLP (Part 1: Intuition)

#artificialintelligence

(This is a crosspost from the official Surge AI blog. If you need help with data labeling and NLP, say hello!) Language models are a core component of NLP systems, from machine translation to speech…


Study finds that even the best speech recognition systems exhibit bias - Dataconomy

#artificialintelligence

This article originally appeared on VentureBeat and is reproduced with permission. Even state-of-the-art automatic speech recognition (ASR) algorithms struggle to recognize the accents of people from certain regions of the world. That's the top-line finding of a new study published by researchers at the University of Amsterdam, the Netherlands Cancer Institute, and the Delft University of Technology, which found that an ASR system for the Dutch language recognized speakers of specific age groups, genders, and countries of origin better than others. Speech recognition has come a long way since IBM's Shoebox machine and Worlds of Wonder's Julie doll. But despite progress made possible by AI, voice recognition systems today are at best imperfect -- and at worst discriminatory.


Study finds that even the best speech recognition systems exhibit bias

#artificialintelligence

Even state-of-the-art automatic speech recognition (ASR) algorithms struggle to recognize the accents of people from certain regions of the world. That's the top-line finding of a new study published by researchers at the University of Amsterdam, the Netherlands Cancer Institute, and the Delft University of Technology, which found that an ASR system for the Dutch language recognized speakers of specific age groups, genders, and countries of origin better than others. Speech recognition has come a long way since IBM's Shoebox machine and Worlds of Wonder's Julie doll. But despite progress made possible by AI, voice recognition systems today are at best imperfect -- and at worst discriminatory. In a study commissioned by the Washington Post, popular smart speakers made by Google and Amazon were 30% less likely to understand non-American accents than those of native-born users.


Even the Best Speech Recognition Systems Exhibit Bias, Study Finds - Slashdot

#artificialintelligence

An anonymous reader quotes a report from VentureBeat: Even state-of-the-art automatic speech recognition (ASR) algorithms struggle to recognize the accents of people from certain regions of the world. That's the top-line finding of a new study published by researchers at the University of Amsterdam, the Netherlands Cancer Institute, and the Delft University of Technology, which found that an ASR system for the Dutch language recognized speakers of specific age groups, genders, and countries of origin better than others. The coauthors of this latest research set out to investigate how well an ASR system for Dutch recognizes speech from different groups of speakers. In a series of experiments, they observed whether the ASR system could contend with diversity in speech along the dimensions of gender, age, and accent. The researchers began by having an ASR system ingest sample data from CGN, an annotated corpus used to train AI language models to recognize the Dutch language.


The Ultimate Guide To Speech Recognition With Python – Real Python

@machinelearnbot

Have you ever wondered how to add speech recognition to your Python project? If so, then keep reading! It's easier than you might think. Far from a being a fad, the overwhelming success of speech-enabled products like Amazon Alexa has proven that some degree of speech support will be an essential aspect of household tech for the foreseeable future. If you think about it, the reasons why are pretty obvious. Incorporating speech recognition into your Python application offers a level of interactivity and accessibility that few technologies can match. The accessibility improvements alone are worth considering. Speech recognition allows the elderly and the physically and visually impaired to interact with state-of-the-art products and services quickly and naturally--no GUI needed! Best of all, including speech recognition in a Python project is really simple. In this guide, you'll find out how. In the end, you'll apply what you've learned to a simple "Guess the Word" game and see how it all comes together.


Why Is Speech Recognition Technology So Difficult to Perfect?

Huffington Post - Tech news and opinion

This is an excellent question to start off an automatic speech recognition (ASR) interview. I would slightly rephrase the question as "Why is speech recognition hard?" An ASR is just like any other machine learning (ML) problem, where the objective is to classify a sound wave into one of the basic units of speech (also called a "class" in ML terminology), such as a word. The problem with human speech is the huge amount of variation that occurs while pronouncing a word. For example, below are two recordings of the word "Yes" spoken by the same person (wave source: AN4 dataset [1]).


Why Isn't Voice Recognition Software More Accurate?

Forbes - Tech

This is an excellent question to start off an automatic speech recognition (ASR) interview. I would slightly rephrase the question as "Why is speech recognition hard?" An ASR is just like any other machine learning (ML) problem, where the objective is to classify a sound wave into one of the basic units of speech (also called a "class" in ML terminology), such as a word. The problem with human speech is the huge amount of variation that occurs while pronouncing a word. For example, below are two recordings of the word "Yes" spoken by the same person (wave source: AN4 dataset [1]).